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ABSTRACT 

We present a statistical analysis of different metrics char- 
acterizing the topological properties of Internet maps, col- 
lected at two different resolution scales: the router and the 
autonomous system level. The metrics we consider allow 
us to confirm the presence of scale-free signatures in several 
statistical distributions, as well as to show in a quantitative 
way the hierarchical nature of the Internet. Our findings 
are relevant for the development of more accurate Internet 
topology generators, which should include, along with the 
scale-free properties of the connectivity distribution, the hi- 
erarchical signatures unveiled in the present work. 

1. INTRODUCTION 

The relentless growth of the Internet goes along with a wide 
range of internetworking problems related to routing proto- 
cols, resource allowances, and physical connectivity plans. 
The study and optimization of algorithms and policies re- 
lated to such problems rely heavily on theoretical analysis 
and simulations that use model abstractions of the actual 
Internet. On the other hand, in order to extract the maxi- 
mum benefit from these studies, it is necessary to work with 
reliable Internet topology generators. The basic priority at 
this respect is to best define the topology to use for the 
network being simulated. This implies the characterization 
of how routers, hosts, and physical links interconnect with 
each other in shaping the actual Internet. 

In the last years, research groups started to deploy technolo- 
gies and infrastructures in order to obtain a more detailed 
picture of the Internet. Several studies, aimed at tracking 
and visualizing the Internet large scale topology and/or per- 
formance, are leading to Internet mapping projects at differ- 
ent resolution scales. These projects typically collect data 
on Internet elements (routers, domains) and the connections 
among them (physical links, peer connections), in order to 



create a graph-like representation of large parts of the Inter- 
net in which the nodes represent those elements and the links 
represent the respective connections. Mapping projects fo- 
cus essentially on two levels of topological description. First, 
by inferring router adjacencies it has been possible to mea- 
sure the Internet router (IR) level topology. The second 
measured topology works at the autonomous system (AS) 
level and the connectivity obtained from AS routing path in- 
formation. Although these two representations are related, 
it is clear that they describe the Internet at rather different 
scales. In fact, each AS groups a generally large number 
of routers, and therefore the AS maps are in some sense a 
coarse-grained view of the IR maps. 

Internet maps exhibit an extremely large degree of hetero- 
geneity and the use of statistical tools becomes mandatory 
to provide a proper mathematical characterization of this 
system. Statistical analysis of the Internet maps fabric have 
pointed out, to the surprise of many researchers, a very com- 
plex connectivity pattern with fluctuations extending over 
several orders of magnitude E). In particular, it has been 
observed a power-law behavior in metrics and statistical dis- 
tributions of Internet maps at different levels ^, || [ij ^, 
[]> 0> II' This evidence makes the Internet an example 
of the so-called scale-free networks jl^ and uncover a pe- 
culiar structure that cannot be satisfactorily modeled with 
traditional topology generators. Previous Internet topol- 
ogy generators, based in the classical Erdos and Renyi ran- 
dom graph model Jn], [l^] or in hierarchical models, yielded 
an exponentially bounded connectivity pattern, with very 
small fluctuations and in clear disagreement with the recent 
empirical findings. A theoretical framework for the origin 
of scale-free graphs has been put forward by Barabasi and 
Albert Jl^ by devising a novel class of dynamical growing 
networks. Following these ideas, several Internet topology 
generators yielding power- law distributions have been sub- 
sequently proposed |l^, |l^] . 

Data gathering projects j^, ^| are progressively 

making available larger AS and IR level maps which are sus- 
ceptible of more accurate statistical analysis and raise new 
and challenging questions about the Internet topology. For 
instance, statistical distributions show deviations from the 
pure power-law behavior and it is important to understand 
to which extent the Internet can be considered a scale-free 
graph. The way these scaling anomalies — usually signaled 



by the presence of cut-offs in the corresponding statisti- 
cal distributions — are related to the Internet finite size and 
physical constraints is a capital issue in the characterization 
of the Internet and in the understanding of the dynamics 
underlying its growth. A further important issue concerns 
the fact that the Internet is organized on different hierarchi- 
cal levels, with a set of backbone links carrying the traffic 
between local area providers. This structure is reflected in 
a hierarchical arrangement of administrative domains and 
in a different usage of links and connectivity of nodes. The 
interplay between the scale-free nature and the hierarchical 
properties of the Internet is still unclear, and it is an impor- 
tant task to find metrics that can exploit and characterize 
hierarchical features on the AS and IR level. Finally, al- 
though one would expect Internet AS and IR level maps to 
exhibit similar scale-free properties, the different resolution 
in both kinds of maps might lead to a diversity of metrics 
properties. 

In this paper we present a detailed statistical analysis of 
large AS and IR level maps [ fl6[ |fj| , fj"9{ . We study the scale- 
free properties of these maps, focusing on the degree and 
betweenness distributions. While scale-free properties are 
confirmed for maps at both levels, IR level maps show also 
the presence of an exponential cut-off, that can be related 
to constraints acting on the physical connectivity and load 
of routers. Power-law distributions with a cut-off are a gen- 
eral feature of scale-free phenomena in real finite systems 
and we discuss their origin in the framework of growing net- 
works. At the AS level we confirm the presence of a strong 
scale-free character for the large-scale degree and between- 
ness distributions. We also discuss that deviations from the 
pure power-law behavior found in recent maps Jf8j at in- 
termediate connectivities has a marginal impact on the re- 
silience and information spreading properties of the Internet 

Furthermore, we propose two metrics based on the connec- 
tivity and the clustering correlation functions, that appear 
to sharply characterize the hierarchical properties of Inter- 
net maps. In particular, these metrics clearly distinguish 
between the AS and IR levels, which show a very different 
behavior at this respect. While IR level maps appear to pos- 
sess almost no hierarchical structure, AS maps fully exploit 
the hierarchy of domains around which the Internet revolves. 
The differences highlighted between the two levels might be 
very important in the developing of faithful Internet topol- 
ogy generators. The testing of Internet protocols working at 
different levels might need of topology generators accounting 
for the different properties observed. Hierarchical features 
are also important to scrutinize theoretical models propos- 
ing new dynamical growth mechanisms for the Internet as a 
whole. 

2. INTERNET MAPS 

Nowadays the Internet can be partitioned in autonomously 
administered domains which vary in size, geographical ex- 
tent, and function. Each domain may exercise traffic restric- 
tions or preferences, and handle internal traffic according to 
particular autonomous policies. This fact has stimulated 
the separation of the inter-domain routing from the intra- 
domain routing, and the introduction of the Autonomous 
Systems Number (ASN). Each AS refers to one single ad- 



ministrative domain of the Internet. Within each AS, an 
Interior Gateway Protocol is used for routing purposes. Be- 
tween ASs, an Exterior Gateway Protocol provides the inter- 
domain routing system. The Border Gateway Protocol (BGP) 
is the most widely used inter-domain protocol. In particu- 
lar, it assigns a 16-bit ASN to identify, and refer to, each 
AS. 

The Internet is usually portrayed as an undirected graph. 
Depending on the meaning assigned to the nodes and links 
of the associated graph, we can obtain different levels of 
representation, each one corresponding to a different degree 
of coarse-graining respect to the physical Internet. 

Internet Router level: In the IR level maps, nodes repre- 
sents the routers, while links represent the physical connec- 
tions among them. In general, all mapping efforts at the 
IR level are based on computing router adjacencies from 
traceroute sequences sent to a list of networks in the In- 
ternet. The traceroute command performed from a single 
source provides a spanning tree from that source to every 
other (reachable) node in the network. By merging the in- 
formation obtained from different sources it is possible to 
construct IR level maps of different portions of the Internet. 
In order to catch all the various cross-links, however, a large 
number of source probes is needed. In addition, the instabil- 
ity of paths between routers and other technical problems — 
such as multiple alias interfaces — make the mapping a very 
difficult task p3j . These difficulties have been diversely tack- 
led by the different Internet mapping projects: the Lucent 
project at Bell Labs pCf, t he Cooperative Association for 
Internet Data Analysis""[J17|, and the SCAN project at the 
Information Sciences Institute 113] , that develop methods to 
obtain partial maps from a single source. 

Autonomous System level: In the AS level graphs each node 
represents an AS, while each link between two nodes rep- 
resents the existence of a BGP peer connection among the 
corresponding ASs. It is important to stress that each AS 
groups many routers together and the traffic carried by a 
link is the aggregation of all the individual end-host flows 
between the corresponding ASs. The AS map can be con- 
structed by looking at the BGP routing tables. In fact, the 
BGP routing tables of each AS contains a spanning tree 
from that node to every other (reachable) AS. We can then 
try to reconstruct the complete AS map by merging the 
connectivity information coming from a certain fraction of 
these spanning trees. This method has been actually used 
by the National Laboratory for Applied Network Research 
(NLANR) ||, using the BGP routing tables collected at the 
Oregon route server, that gathers BGP-related information 
since 1997. Enriched maps can be obtained from some other 
public sources, such as Looking Glass sites and the Reseaux 
IP Europeens (RIPE) ||, getting about 40% of new AS- AS 
connections. 

These graph representations do not model individual hosts, 
too numerous, and neglect link properties such as band- 
width, actual data load, or geographical distance. For these 
reasons, the graph-like representation must be considered as 
an overlay of the basic topological structure: the skeleton of 
the Internet. Moreover, the data collected for the two levels 
are different, and both representations may be incomplete 



or partial to different degrees. In particular, measurements 
may not capture all the nodes present in the actual network 
and, more often, they do not include all the links among 
nodes. It is not our purpose here to argue about the reli- 
ability of the different maps. However, the conclusions we 
shall present in this paper seem rather stable in time for the 
different maps. Hopefully, this fact means that, despite the 
different degrees of completeness, the present maps repre- 
sent a fairly good statistical sampling of the Internet as a 
whole. In particular, we shall use the map collected during 
October/November 1999 by the SCAN project with the Mer- 
cator software as representative of the Internet router level. 
At the autonomous system level we consider the (AS) map 
collected at Oregon route server and the enriched (AS+) 
map (available at (jjj), both dated May 25, 2001. 

3. AVERAGE PROPERTIES 

We start our study by analysing some standard metrics: the 
total number of nodes N and edges E, the node connectivity 
ki, the minimum path distance between pairs of nodes dij, 
the clustering coefficient Ci, and the betweenness bi. The 
connectivity ki of a node is defined as the number of edges 
incident to that node, i.e. the number of connections of 
that node with other nodes in the network. If nodes i and 
j are connected we will say that they are nearest neigh- 
bors. The minimum path distance dij between a pair of 
nodes i and j is defined as the minimum number of nodes 
traversed by a path that goes from one node to the other. 
The clustering coefficient Cj of the node i is defined as 
the ratio between the number of edges e; in the sub-graph 
identified by its nearest neighbors and its maximum possible 
value ki(ki — l)/2, corresponding to a complete sub-graph, 
i.e. Ci = 2ei/ki(ki — 1). This magnitude quantifies the 
tendency that two nodes connected to the same node are 
also connected to each other. The clustering coefficient c; 
takes values of order 0(1) for grid networks. On the other 
hand, for random graphs [u\ [l2j ], which are constructed by 
connecting nodes at random with a fixed probability p, the 
clustering coefficient is of order 0(N~ 1 ). Finally, the be- 
tweenness bi of a node i is defined as the total number of 
minimum paths that pass through that node. It gives an 
measure of the amount of traffic that goes through a node, 
if the minimum path distance is considered as the metric 
defining the optimal path between pairs of nodes. The aver- 
age values of these metrics over every node (or pair of nodes 
for dij) in the AS, AS+, and IR maps is given in Table [j]. 

The average connectivity for the three maps is of order 0(1); 
therefore, they can be considered as sparse graphs. Despite 
the small average connectivity, however, the average mini- 
mum path distance is also very small, compared to the size 
of the maps. The probability distribution of the minimum 
path distance, pd = Prob[dij = d], is shown in Fig. |l[ For all 
maps this distribution is sharply peaked around the average 

Map N E (k) (d) (c) (b)/N 

~TR 228298 320105 2.80 9.51 O03 4T4 

AS 11174 23367 4.18 3.62 0.22 3.61 

AS+ 11461 32711 5.71 3.56 0.24 3.56 



Table 1: Average metrics of the AS, AS+, and IR 
maps. See text for the metrics' definitions. 
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Figure 1: Probability distribution pd = Prob[dij = d] 
of the minimum path distance between nodes, for 
the AS, AS+, and IR maps. 



value (d); therefore, we can take (d) as the characteristic 
minimum path distance. In the next section we will show 
that this is not the case for the connectivity, that is charac- 
terized by large fluctuations from node to node. Thus, the 
Internet strikingly exhibits what is known as the "small- 
world" effect |24|: in average one can go from one node to 
any other in the system passing through a very small num- 
ber of intermediate nodes. Since the network is sparse this 
necessarily implies that there are some hubs and backbones 
which connect different regional networks, strongly decreas- 
ing the value of (d). The small world evidence is strength- 
ened by the empirical finding of clustering coefficients for 
the AS, AS+, and IR four orders of magnitude larger than 
the corresponding value for a random graph of the same size, 
0(N~ 1 ). As discussed above, this fact implies that neigh- 
bors of the same node are very likely on their turn connected 
among themselves. The high clustering coefficient of the In- 
ternet maps is probably due to geographical constraint. In 
Internet graphs, all links are equivalent. Yet, the physical 
connections are characterized by a real space length. The 
larger is this length, the higher the cost of installation and 
maintenance of the physical line, favoring therefore the pref- 
erential connection between nearby nodes. It is likely that 
nodes within the same geographical region will have a large 
number of connections among them, increasing in this way 
the clustering coefficient. 

Another measure of interest is given by the number of min- 
imal paths that pass by each node. To go from one node 
in the network to another following the minimum path, a 
sequence of nodes is visited. If we do this for every pair 
of nodes in the network, there will be a certain number of 
key nodes that will be visited more often than others. Such 
nodes will be of great importance for the transmission of 
information along the network. This evidence can be quan- 
titatively measured by means of the betweenness bi ; i. e. the 
number of minimum paths that go through each node i. This 
magnitude has been introduced in the analysis of social net- 
works in Ref. ^| and more recently it has been studied for 
the AS maps, with the name of load [Q. An algorithm 
to compute the betweenness has been described in Ref. [^HJ . 
For a star network the betweenness takes its maximum value 



N(N — 1)/2 at the central node and its minimum value TV — 1 
at the vertices of the star. The average betweenness of the 
AS, AS+, and IR maps analyzed here is 0(N), as shown 
in Table In the case of the AS and AS+ maps, despite 
the enriched map has a much larger number of edges, the 
average measures are very similar. 

While some metrics are very alike (for instance, the average 
betweenness (6)), some differences among others are consis- 
tent with the fact that the AS and AS+ maps are a coarse- 
grained representation of the IR map. The IR level map 
is, for instance, sparser, and its average minimum path dis- 
tance is larger. The IR map has a small average connectiv- 
ity, because routers have a finite capacity and, therefore, can 
have a limited number of connections. On the contrary, ASs 
can have in principle any number of connections, since they 
represent the aggregation of a large number of routers. This 
implies that AS maps have a greater number of nodes with a 
high number of connections (hubs), providing the shortcuts 
needed to produce a small average minimum path distance. 

4. SCALE-FREE PROPERTIES 

The analysis of the average measures presented in the previ- 
ous section makes clear that the Internet does not resemble 
a star-shaped architectures with just a few gigantic hubs 
and a multitude of singly connected nodes. The same mea- 
surements rule out as well the possibility of a random graph 
structure or a regular grid architecture. These evidences 
suggest a peculiar topology that will be clearly identified by 
looking at the detailed distributions. In particular, Falout- 
sos et al. Q pointed out for the first time that the connec- 
tivity properties of the Internet AS maps are characterized 
by a probability distribution that a node has fc links with 
the form pt ~ fc -7 , where 7 ~ 2.1 is a characteristic expo- 
nent. This behavior signals the presence of scale- free con- 
nectivity properties; i.e. there is no characteristic connec- 
tivity above which the probability is decaying exponentially 
to zero. In other words, there is a statistically significant 
probability that a node has a very large number of connec- 
tions compared to the average connectivity {fc). In addition, 
the implicit divergence of (fc 2 ) is signalling the extreme het- 
erogeneity of the connectivity pattern, since it implies that 
statistical fluctuations are unbounded. The work of Falout- 
sos et al. was followed by different studies of AS maps p7| , 
0], AS+ maps Q, and IR maps j^, B. Here, we will revisit 
the analysis of scale-free properties in recent AS, AS+, and 
IR level maps. 

We start by considering the integrated connectivity proba- 
bility Pfc = Prob[fci > k]. In the case of a pure power-law 
probability distribution pk ~ fc -7 , we expect the functional 
behavior Pk = afc 1-7 , where a is a normalization constant. 
In Fig. [2] we show the connectivity distribution for the AS, 
AS+, and IR maps. For the AS map a clear power law 
decay with exponent 7 = 2.1 ± 0.1 is observed, as it has 
been already reported elsewhere |l|, |27], The reported 
distribution is also stable in time as found by analyzing dif- 
ferent time snapshot of the AS level maps obtained by the 
NLANR As noted in Ref. 0, the connectivity distribu- 
tion for the AS+ enriched data deviates from a pure power 
law at intermediate connectivities. This anomaly might or 
might not be related to the biased enrichment of the Internet 
sampling (see Ref. Q). While this represents an important 




Figure 2: Integrated connectivity distribution Pk = 
Prob[fci > k] for the AS, AS+, and IR maps. The solid 
line corresponds to a power law decay Pk ~ fc 1-7 with 
exponent 7 = 2.1. 



point in the detailed description of the connectivity prop- 
erties, it is not critical concerning the scale-free nature of 
the Internet. With respect to the network physical proper- 
ties, it is just the large connectivity region that is actually 
effective. Indeed, recent studies about network resilience to 
removal of nodes |^l[ and virus spreading |^2| have shown 
that the relevant parameter is the ratio k = (fc 2 ) / (fc) be- 
tween the first two moments of the connectivity distribution. 
If k 3> 1 then the network manifests some properties that 
are not observed for networks with exponentially bounded 
connectivity distributions. For instance, we can randomly 
remove practically all the nodes in the network and a giant 
connected component |l^] will still exist. In both the AS 
and AS+ maps, in fact, we observe a wide connectivity dis- 
tribution, with the same dependency for very large fc. The 
factor k is mainly determined by the tail of the distribu- 
tion, and is very similar for both maps. In particular, we 
estimate k — 265 and n — 222 for the AS and AS+ maps, 
respectively. With such a large values, for all practical pur- 
poses (resilience, virus spreading, traffic, etc.) the AS and 
AS+ maps behave almost identically. 

The connectivity distribution of the IR level map has a 
power-law behavior that is, however, smoothed by a clear 
exponential cut-off. The existence of a power-law tendency 
for small connectivities is better seen for the probability dis- 
tribution pk = Prob[fci = fc], as shown in Fig. [j. A power 
law fit of the form pk = a(l — 7)fc~ 7 for fc < 300 yields 
the exponent 7 = 2.1 ± 0.1, in perfect agreement with the 
exponent found for the integrated connectivity distribution 
in the AS map. Nevertheless, for fc 3> 50 the IR map inte- 
grated connectivity distribution follows a faster decay. This 
picture is consistent with a finite size scaling of the form 
Pk — fc _7 /(fc/fc c ) |2q|. Here fc c is a characteristic connectiv- 
ity beyond which the distribution decays faster than a power 
law, and f(x) has the asymptotic behavior f(x) = const, for 
i< 1 and f(x) -C 1 for i>1. Deviations from the power 
law behavior at large connectivities have been also observed 
for the larger maps reported in Ref. ||. In that work, the 
integrated probability distribution is fitted to the Weibull 



Pk io" 3 




10 



10 ! 



10 ' 



10 



10 



10 



o 




oo 






- 


: "S^W 




OIR 




■ AS 




o AS+ 


a... 



10 



10 



10" 

b/N 



10 



L0* 



Figure 3: Connectivity distribution pk = Prob[fci = k] 
for the IR map. The solid line is a power law decay 
Pk ~ fc -7 with 7 = 2.1. 



Figure 4: Integrated betweenness distribution Pt = 
Prob[6i > b] for the AS, AS+, and IR maps. The solid 
line is a power law decay Pt ~ & 1_7b with 7^ = 1.9. 



distribution Pk = aexp [-(fc/fce)' 3 ]. While we do not want 
to enter into the details of the different fitting procedures, we 
suggest that the more general fitting form pt = fc~ 7 /(fc/fc c ), 
in which 7 is an independent fitting parameter, is likely a 
better option. 

The presence of truncated power laws must not be consid- 
ered a surprise, since it finds a natural place in the con- 
text of scale-free phenomena. Actually, bounded scale-free 
distributions (i.e. power-law distributions with a cut-off) 
are implicitly present in every real world system because of 
finite-size effects or physical constraints. Truncated power 
laws are observed also in other real networks and differ- 
ent mechanisms have been proposed to explain the cut-off 
for large connectivities. Actually, we can distinguish two 
different kinds of cut-offs in real networks. The first is an 
exponential cut-off, f(x) = exp(-x), which can be explained 
in terms of a finite connectivity capacity of the network el- 
ements (29) or incomplete information [^o). This is likely 
what is happening at the IR level, where the finite capac- 
ity constraint (maximum number of router interfaces) is, in 
our opinion, the dominant mechanism affecting the tail of 
the connectivity distribution. In this perspective, larger and 
more recent samples at the IR level could present a shift in 
the cut-off due to the improved technical router capabilities 
and the larger statistical sampling. A second possibility is 
given by a very steep cut-off such as f(x) — 9(1 — x), where 
9(x) is the Heaviside step function. This is what happens 
in growing networks with a finite number of elements. Since 
SF networks are often dynamically growing networks, this 
case represents a network which has grown up to a finite 
number of nodes TV. The maximum connectivity k c of any 
node is related to the network age. The scale-free behavior 
is evident up the k c and then decays as a step function since 
the network does not possess any node with connectivity k 
larger than k c . By inspecting Fig. [| this second possibil- 
ity appears realized at the AS level. Indeed, the dominant 
mechanism at this level is the finite size of the network, 
while connectivity limits are not present, since each AS is a 
collection of a large number of routers, and it can handle a 
very large connectivity load. 



The connection between finite capacity and bounded distri- 
butions becomes evident also if we consider the betweenness. 
This magnitude is a static estimate of the amount of traffic 
that a node supports. Hence, if a router has a bounded ca- 
pacity, the betweenness distribution should also be bounded 
at large betweenness. On the contrary, this effect should 
be absent for the AS maps. The integrated betweenness 
distribution P b = Prob[6i > b] for the AS, AS+, and IR 
maps is shown in Fig. ^ The AS and AS+ distributions are 
practically the same and they are well fitted by a power law 
Pb ~ & 1_7b with an exponent 76 = 1.9 ± 0.1. In the case 
of the IR map, on the other hand, the betweenness distri- 
bution follows a truncated power law, in analogy to what 
is observed for the connectivity distribution. The between- 
ness distribution, therefore, corroborates the equivalence be- 
tween the AS and AS+ maps, and the existence of truncated 
power laws for the IR map. 

Finally, it is worth to stress that while the power law trun- 
cation is an expected feature of finite systems, the scale- free 
regime is the important signature of an emergent cooper- 
ative behavior in the Internet dynamical evolution. This 
dynamics play therefore a central role in the understanding 
and modeling of the Internet. In this persepective, the de- 
velopingof a statistical mechanics approach to complex net- 
works [10] is providing a new dynamical framework where 
the distinctive statistical regularities of the Internet can be 
understood in term of the basic processes ruling the appear- 
ance or disappearence of nodes and links. 

5. HIERARCHY AND CORRELATIONS 

The topological metrics analyzed so far give us a distinc- 
tion between the AS and IR maps with respect to the large 
connectivity and betweenness properties. The difference be- 
comes, however, more evident if we consider properties re- 
lated with the existence of hierarchy and correlations. The 
primary known structural difference in the Internet is the 
distinction between stub and transit domains. Nodes in 
stub domains have links that go only through the domain 
itself. Stub domains, on the other hand, are connected via a 
gateway node to transit domains that, on the contrary, are 




Figure 5: Average clustering coefficient as a function 
of the node connectivity for the AS, AS+, and IR 
maps. The solid line is given by the power law decay 
(c) k ~ k~° ,75 . The horizontal dashed line marks the 
average clustering coefficient (c) = 0.03 computed for 
the IR map. 
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Figure 6: Nearest neighbors average connectivity for 
the AS, AS+, and IR maps. The solid line is given 
by the power law decay (k nn )k ~ fc~ 55 . The hori- 
zontal dashed line marks the value in the absence of 
correlations, (k nn ) l l — (k 2 ) / (k) — 26.9, computed for 
the IR map. 



fairly well interconnected via many paths. This hierarchy 
can be schematically divided in international connections, 
national backbones, regional networks, and local area net- 
works. Nodes providing access to international connections 
or national backbones are of course on top level of this hier- 
archy, since they make possible the communication between 
regional and local area networks. Moreover, in this way, a 
small average minimum path length can be achieved with a 
small average connectivity. This hierarchical structure will 
introduce some correlations in the network structure, and 
it is an important issue to understand how these features 
manifest at the topological level. In order to exploit the 
presence of hierarchies in Internet maps we introduce two 
metrics based on the clustering coefficient and the nearest 
neighbor average connectivity W. 

The previously defined clustering coefficient is the average 
probability that two neighbors I and m of a node i are con- 
nected. Let us consider the adjacency matrix aij, that in- 
dicates whether there is a connection between the nodes 
i and j (aij — 1), or the connection is absent (ay = 0). 
Given the definition of the clustering coefficient, it is easy 
to see that the number of edges in the subgraph identified 
by the nearest neighbors of the node i can be computed as 
e, = (1/2) J2im a a a im.a m i- Therefore, the clustering coef- 
ficient Cj measures the existence of correlations in the ad- 
jacency matrix, weighted by the corresponding node con- 
nectivity. In section [] we have shown that the clustering 
coefficient for the AS, AS+, and IR maps is four orders 
of magnitude larger than the one expected for a random 
graph and, therefore, that they are far from being random. 
Further information can be extracted if one computes the 
clustering coefficient as a function of the node connectivity 
Q. In Fig. ^ we plot the average clustering coefficient (c) k 
for nodes with connectivity k. In the case of the AS and 
AS+ maps this quantity follows a similar trend that can 
be approximated by a power law decay with an exponent 
around 0.75. For the IR map, however, except for a sharp 
drop for large values of k, attributable to low statistics, it 



is almost constant, and equal to the average clustering co- 
efficient (c) = 0.03. This implies that, in the AS and AS+ 
maps, nodes with a small number of connections have larger 
local clustering coefficients than those with a large connec- 
tivity. This behavior is consistent with the picture described 
in the previous section of highly clustered regional networks 
sparsely interconnected by national backbones and interna- 
tional connections. The regional clusters of ASs are prob- 
ably formed by a large number of nodes with small con- 
nectivity but large clustering coefficients. Moreover, they 
should also contain nodes with large connectivities that are 
connected with the other regional clusters. These large con- 
nectivity nodes will be on their turn connected to nodes in 
different clusters which are not interconnected and, there- 
fore, will have a small local clustering coefficient. On the 
contrary, in the IR level map these correlations are absent. 
Somehow the domain hierarchy does not produce any sig- 
nature at the single router scale, where the geographic con- 
straints and connectivity bounds probably play a more im- 
portant role. 

These observations for the clustering coefficient are sup- 
ported by another metric related with the correlations be- 
tween node connectivities. These correlations are quantified 
by the probability p c (q | k) that, given a node with connec- 
tivity k, it is connected to a node with connectivity q. With 
the available data, a direct plot of p c (q | k) results very noisy 
and difficult to interpret Thus in Ref. we suggested 
to measure instead the nearest neighbors average connectiv- 
ity of the nodes of connectivity k, (k nn ) k = ^2 q qp c (q\k), 
and to plot it as a function of the connectivity k. If there 
are no connectivity correlations (i.e. for a random network), 
then p° c (q \ k) — qp q / (k), where p q is the connectivity distri- 
bution, and we obtain (k n n)1 = (k 2 ) / (fc), which is indepen- 
dent of k. The corresponding plots for the AS, AS+, and IR 
maps are shown in Fig. ^j. For the AS and AS+ maps we ob- 
serve a power-law decay for more than two decades, with a 
characteristic exponent 0.55, clearly indicating the existence 
of correlations. On the contrary, the IR map displays again 



an almost constant nearest neighbors average connectivity, 
very similar to the expected value for a random network with 
the same connectivity distribution, (k nn )% ~ 30. Again, the 
sharp drop for large k can be attributed to the low statistics 
for such large connectivities. Therefore, also in this case the 
two levels of representation show very different features. 

It is worth remarking that the present analysis of the hier- 
archical and correlation properties shows a very good con- 
sistency of results in the case of the AS and AS+ maps. 
This points out a robustness of these features that can thus 
be considered as general properties at the AS level. On 
the other hand, the IR map shows a marked difference that 
must be accounted for when developing topology generators. 
In other words, Internet protocols working at different rep- 
resentation levels must be thought as working on different 
topologies. Topology generators as well must include these 
differences, depending on the level at which we intend to 
model the Internet topology. 

6. CONCLUSIONS 

The increasing availability of larger Internet maps and the 
proliferation of growing networks models with scale-free fea- 
tures have recently stimulated a more detailed statistical 
analysis aimed at the identification of distinctive metrics 
and features for the Internet topology. At this respect, in 
the present work we have presented a detailed statistical 
analysis of several metrics on Internet maps collected at the 
router and autonomous system levels. Our analysis con- 
firms the presence of a power-law (scale-free) behavior for 
the connectivity distribution, as well as for the betweenness 
distribution, that can be associated to a measure of the load 
of the nodes in the maps. The exponential cut-offs observed 
in the IR maps, associated to the limited capacity of the 
routers, are absent in the AS level, which conglomerate a 
large number of routers and are thus able to bear a larger 
load. The analysis of the clustering coefficient and the near- 
est neighbors average connectivity show in a quantitative 
way the presence of strong correlations in the Internet con- 
nectivity at the AS level, correlations that can be related to 
the hierarchical distribution of this network. These correla- 
tions, on the other hand, seem to be nonexistent at the IR 
level. The correlation properties clearly indicate the pres- 
ence of strong diferences between the IR and AS levels of 
representation. Our findings represent a step forward in the 
characterization of the Internet topology, and will be help- 
ful for scrutinizing more thoroughly the actual validity of 
the network models proposed so far, and as ingredient in 
the elaboration of new and more realistic Internet topology 
generators. A first step in this direction has been already 
given in the network model proposed in Ref . |u| . 
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